Module 4 - Exploratory Data Analysis

Learning Objectives

  • Understand how Exploratory Data Analysis (EDA) informs Data Preparation and Modeling
  • Visualizing variation and covariation for categorical and numerical data
  • Data checks, data cleaning, outliers, missing data summary statistics
  • Layers and Stats in ggplot
  • Reproducible Examples

Readings

  • RDS (R for Data Science): Chapters 8-11 (Chapter 10 is the most substantial part of the reading assignment)

Additional Reading:

There are several classic texts on Exploratory Data Analysis. These are somewhat dated but contain important insights:

  • Exploratory Data Analysis. J. W. Tukey. (1977).
  • Visualizing Data. W. S. Cleveland. (1993). Hobart press.
  • Exploratory data mining and data cleaning. T. Dasu and T. Johnson. (2003). John Wiley & Sons.

Here is an example of a newer book which is also excellent:

  • Exploratory Data Analysis Using R. Roland Pearson. CRC Press. (2015).

Videos

  • Whole Game Live demo of an Exploratory Data Analysis by Hadley Wickham. This isn’t a super polished video but its really nice to see examples of experts going through their process. The code he typed and data that he used can be found on github. This video has more of a focus on exploring and finding patterns rather than finding problems in a poor dataset.